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* ABSTRACT 

Most information retrieval systems on the Internet rely primarily on similarity ranking algorithms 
based solely on term frequency statistics. Information quality is usually ignored. This leads to the 
problem that documents are retrieved without regard to their quality. We present an approach that 
combines similarity-based similarity ranking with quality ranking in centralized and distributed search 
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were incorporated in centralized search. The improvement seen when the availability, information-to- 
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significant. Finally, incorporating the popularity metric in information fusion resulted in a significant 
improvement. In summary, the results show that incorporating quality metrics can generally improve 
search effectiveness in both centralized and distributed search environments. 
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* ABSTRACT 



We seek to gain improved insight into how Web search engines shouldcope with the evolving Web, in 
an attempt to provide users with themost up-to-date results possible. For this purpose we 
collectedweekly snapshots of some 150 Web sites over the course of one year,and measured the 
evolution of content and link structure. Our measurements focus on aspects of potential interest to 
search engine designers: the evolution of link structure over time, the rate ofcreation of new pages 
and new distinct content on the Web, and the rate of change of the content of existing pages under 
search -centric measures of degree of change. Our findings indicate a rapid turnover rate of Web 
pages, i.e., high rates of birth and death, coupled with an even higher rate ofturnover in the 
hyperlinks that connect them. For pages that persistover time we found that, perhaps surprisingly, 
the degree of contentshift as measured using TF.IDF cosine distance does not appear to 
beconsistently correlated with the frequency of contentupdating. Despite this apparent non- 
correlation, the rate of content shift of a given page is likely to remain consistent over time. That is, 
pages that change a great deal in one week will likely change by a similarly large degree in the 
following week. Conversely, pages that experience little change will continue to experience little 
change. We conclude the paper with a discussion of the potential implications ofour results for the 
design of effective Web search engines. 
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